A Corpus of Tables in Full-Text Biomedical Research Publications
نویسندگان
چکیده
The development of text mining techniques for biomedical research literature has received increased attention in recent times. However, most of these techniques focus on prose, while much important biomedical data reside in tables. In this paper, we present a corpus created to serve as a gold standard for the development and evaluation of techniques for the automatic extraction of information from biomedical tables. We describe the guidelines used for corpus annotation and the manner in which they were developed. The high inter-annotator agreement achieved on the corpus, and the generic nature of our annotation approach, suggest that the developed guidelines can serve as a general framework for table annotation in biomedical and other scientific domains. The annotated corpus and the guidelines are available at http://www.csse.monash.edu.au/research/umnl/data/index.shtml.
منابع مشابه
Challenges in Information Extraction from Tables in Biomedical Research Publications: a Dataset Analysis
We present a study of a dataset of tables from biomedical research publications. Our aim is to identify characteristics of biomedical tables that pose challenges for the task of extracting information from tables, and to determine which parts of research papers typically contain information that is useful for this task. Our results indicate that biomedical tables are hard to interpret without t...
متن کاملBRONCO: Biomedical entity Relation ONcology COrpus for extracting gene-variant-disease-drug relations
Comprehensive knowledge of genomic variants in a biological context is key for precision medicine. As next-generation sequencing technologies improve, the amount of literature containing genomic variant data, such as new functions or related phenotypes, rapidly increases. Because numerous articles are published every day, it is almost impossible to manually curate all the variant information fr...
متن کاملStructured digital tables on the Semantic Web: toward a structured digital literature
In parallel to the growth in bioscience databases, biomedical publications have increased exponentially in the past decade. However, the extraction of high-quality information from the corpus of scientific literature has been hampered by the lack of machine-interpretable content, despite text-mining advances. To address this, we propose creating a structured digital table as part of an overall ...
متن کاملPubRunner: A light-weight framework for updating text mining
Biomedical text mining promises to assist biologists in quickly navigating the combined knowledge in their domain. This would allow improved understanding of the complex interactions within biological systems and faster hypothesis generation. New biomedical research articles are published daily and text mining tools are only as good as the corpus from which they work. Many text mining tools are...
متن کاملFuture competencies for hospital management in developing countries: Systematic review
Background: This was a systematic review presenting the future competencies for hospital managers. Methods: Participants, interventions, comparisons and outcomes (PICO) strategy with MeSH terms were used for searching. Databases used were Web of Science, PsycINFO and Medline, EBSCO, ScienceDirect, Emerald, ProQuest, Social Sciences Research Network, Embase, and some Iranian database su...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016